In our initial discussion over choosing a topic for our project, we
narrowed it down to environmental-related data because we were
interested in seeing possible trends over time and the vast quantity of
environmental data is available.
Introduction
- Air pollution can be detrimental to both our health and the climate
- Outdoor and indoor air pollution cause chronic pain, respiratory
diseases, shortened lifespan
- Air pollution kills about 7 million people worldwide every year
- Hopefully this information can showcase the importance of air
pollution and that we should be more mindful about our planet
- Overview
- We will see how different air pollution types affect the
population
- compare past and present population numbers
- determine which air pollutant type has the highest associated death
rate
Packages Required
#This will allow us to filter through our data
library(tidyverse)
library(dplyr)
#This will help us plot figures to showcase our findings
library(ggplot2)
#This will help us organize and display our data as necessary
library(knitr)
library(kableExtra)
#This expands our plot uses
library(plotly)
#Scientific Notation Disabled
options(scipen=T)
Deaths Data
We were excited to do our report over this data because it was
relatively tidy and had quite a few categorical variables and options
for additional columns to graph.
Our deaths due to air pollution data set was from Kaggle. The author
is Akshat Giri and was last updated 2 years ago so it’s pretty relevant.
When we first loaded in the data some of the column names were lengthy
so we shortened them to: country, acronym, year, total deaths, indoor
deaths, outdoor deaths, and ozone deaths.
Import the deaths-due-to-air-pollution data
deaths_df_old <- data.frame(read.csv("death-rates-from-air-pollution.csv"))
glimpse(deaths_df_old)
## Rows: 6,468
## Columns: 7
## $ Entity <chr> "Afghanistan", "Afghan…
## $ Code <chr> "AFG", "AFG", "AFG", "…
## $ Year <int> 1990, 1991, 1992, 1993…
## $ Air.pollution..total...deaths.per.100.000. <dbl> 299.4773, 291.2780, 27…
## $ Indoor.air.pollution..deaths.per.100.000. <dbl> 250.3629, 242.5751, 23…
## $ Outdoor.particulate.matter..deaths.per.100.000. <dbl> 46.44659, 46.03384, 44…
## $ Outdoor.ozone.pollution..deaths.per.100.000. <dbl> 5.616442, 5.603960, 5.…
We are going to rename a few of the columns and glimpse the data
deaths_df<- deaths_df_old %>% rename(country=Entity, acronym=Code, year=Year, total_deaths=Air.pollution..total...deaths.per.100.000., indoor_deaths=Indoor.air.pollution..deaths.per.100.000., outdoor_deaths=Outdoor.particulate.matter..deaths.per.100.000., ozone_deaths=Outdoor.ozone.pollution..deaths.per.100.000.)
glimpse(deaths_df)
## Rows: 6,468
## Columns: 7
## $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist…
## $ acronym <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG",…
## $ year <int> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1…
## $ total_deaths <dbl> 299.4773, 291.2780, 278.9631, 278.7908, 287.1629, 288.0…
## $ indoor_deaths <dbl> 250.3629, 242.5751, 232.0439, 231.6481, 238.8372, 239.9…
## $ outdoor_deaths <dbl> 46.44659, 46.03384, 44.24377, 44.44015, 45.59433, 45.36…
## $ ozone_deaths <dbl> 5.616442, 5.603960, 5.611822, 5.655266, 5.718922, 5.739…
Data Variables
Variables that interest us here include:
- country
- total_deaths: per 100,000
- indoor_deaths: Indoor air pollution is considered
pollution that occurs in the household. Cooking with solid fuels:
- Wood
- Crop waste, dung
- Charcoal, coal This method of cooking is commonly seen in
underdeveloped countries.
- outdoor_deaths: Outdoor air or ambient air are
emissions caused by combustion processes from motor vehicles, solid-fuel
burning and industries
- Ozone (O3)
- Particulate matter (PM10 and PM2.5)
- Nitrogen dioxide (NO2)
- Carbon monoxide (CO)
- Sulfur dioxide (SO2)
The data set takes a closer look at deaths caused by the ozone itself
which is considered a component of outdoor air pollution.
- ozone_deaths: Ozone is a gas that occurs both in
Earth’s upper atmosphere and at ground level. Ozone in the atmosphere is
an important and helpful greenhouse gas, but ground-level ozone is
created by extensive use of fossil fuels:
- Pollutants emitted by cars
- Power plants, industrial boilers, refineries, chemical plants
World Population Data
The world population data set is also from Kaggel. The author is
Devakumar K. P. and was last updated 2 years ago so it is also recent.
From looking at a glimpse of the data set you can see the columns are
country name, year, and count which refers to the population at that
time
Now, let’s take a look at the population data.
world_pop <- read.csv("population_total_long.csv")
glimpse(world_pop)
## Rows: 12,595
## Columns: 3
## $ Country.Name <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra", "…
## $ Year <int> 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 196…
## $ Count <int> 54211, 8996973, 5454933, 1608800, 13411, 92418, 20481779,…
To get a general idea of ‘deaths-dataframe’ we made, let’s make a
plots to see what’s happening. This is a plot of indoor x outdoor deaths
around the world by country.
This is a mess, and so we chose two countries from each continent (a
high-population and a low-population country) to graph.
We selected a high-population and a low-population country from each
continent, but we wanted a consistent variation between our selection of
low and high population countries. So, we came up with a formula for
calculating what the low-population country should be by multiplying the
high-population country by .10. For example, when we chose the U.S
(which had a population of 331002651 at the time the data was recorded),
we multiplied this number by .10 to get 33100265.1 and find the country
whose population most-closesly matched (in this case it was Canada with
37742154).
We purposefully left out countries whose population numbers were
higher than the majority because we didn’t want those countries to skew
the data (Russia, India, and China).
|
Country.Name
|
Year
|
Count
|
|
Australia
|
1996
|
18311000
|
|
Brazil
|
1996
|
164614688
|
|
Germany
|
1996
|
81914831
|
|
Nigeria
|
1996
|
110668794
|
|
Pakistan
|
1996
|
127349290
|
|
United States
|
1996
|
269394000
|
|
|
Country.Name
|
Year
|
Count
|
|
Canada
|
1996
|
29610218
|
|
Chile
|
1996
|
14587370
|
|
Sri Lanka
|
1996
|
18367288
|
|
Malawi
|
1996
|
10022789
|
|
New Zealand
|
1996
|
3732000
|
|
Serbia
|
1996
|
7617794
|
|
Continents:
North America: U.S, Canada South America: Brazil, Chile Africa:
Nigeria, Malawi Europe: Germany, Serbia Asia: Pakistan, Sri Lanka
Oceania: Australia, New Zealand
Combine Data Sets
First let’s look at a table of the high and low populated countries
using the world population data set.
|
Country.Name
|
Year
|
Count
|
|
Australia
|
1996
|
18311000
|
|
Brazil
|
1996
|
164614688
|
|
Germany
|
1996
|
81914831
|
|
Nigeria
|
1996
|
110668794
|
|
Pakistan
|
1996
|
127349290
|
|
United States
|
1996
|
269394000
|
|
|
Country.Name
|
Year
|
Count
|
|
Canada
|
1996
|
29610218
|
|
Chile
|
1996
|
14587370
|
|
Sri Lanka
|
1996
|
18367288
|
|
Malawi
|
1996
|
10022789
|
|
New Zealand
|
1996
|
3732000
|
|
Serbia
|
1996
|
7617794
|
|
Next, we are going to see the death count for high and low populated
countries using the deaths dataframe.
|
country
|
acronym
|
year
|
total_deaths
|
indoor_deaths
|
outdoor_deaths
|
ozone_deaths
|
|
Australia
|
AUS
|
1996
|
23.04465
|
0.3585034
|
22.407071
|
0.3249375
|
|
Australia
|
AUS
|
1997
|
22.43025
|
0.3222224
|
21.838737
|
0.3141838
|
|
Australia
|
AUS
|
1998
|
21.50529
|
0.2839769
|
20.960276
|
0.3048918
|
|
Australia
|
AUS
|
1999
|
20.40911
|
0.2590092
|
19.897091
|
0.2953354
|
|
Australia
|
AUS
|
2000
|
19.39822
|
0.2398763
|
18.909240
|
0.2899216
|
|
Australia
|
AUS
|
2001
|
18.58572
|
0.2234341
|
18.118700
|
0.2836469
|
|
Australia
|
AUS
|
2002
|
18.11849
|
0.2105980
|
17.662269
|
0.2859938
|
|
Australia
|
AUS
|
2003
|
17.23830
|
0.1937083
|
16.802536
|
0.2816949
|
|
Australia
|
AUS
|
2004
|
16.34770
|
0.1760229
|
15.932077
|
0.2785466
|
|
Australia
|
AUS
|
2005
|
15.41337
|
0.1599279
|
15.016089
|
0.2757150
|
|
Australia
|
AUS
|
2006
|
14.92239
|
0.1496469
|
14.530223
|
0.2819060
|
|
Australia
|
AUS
|
2007
|
14.92140
|
0.1449723
|
14.514884
|
0.3042005
|
|
Australia
|
AUS
|
2008
|
14.64683
|
0.1383225
|
14.228709
|
0.3254648
|
|
Australia
|
AUS
|
2009
|
14.11563
|
0.1259313
|
13.694572
|
0.3431982
|
|
Australia
|
AUS
|
2010
|
13.57171
|
0.1174834
|
13.140380
|
0.3647233
|
|
Australia
|
AUS
|
2011
|
13.72763
|
0.1119247
|
13.276676
|
0.3956796
|
|
Australia
|
AUS
|
2012
|
12.65973
|
0.1018626
|
12.196401
|
0.4192914
|
|
Australia
|
AUS
|
2013
|
11.87449
|
0.0973836
|
11.384154
|
0.4530427
|
|
Australia
|
AUS
|
2014
|
11.47268
|
0.0931036
|
10.939491
|
0.5037056
|
|
Australia
|
AUS
|
2015
|
11.27679
|
0.0886376
|
10.702072
|
0.5544068
|
|
Australia
|
AUS
|
2016
|
10.58644
|
0.0844017
|
9.974549
|
0.5955779
|
|
Australia
|
AUS
|
2017
|
10.79595
|
0.0833628
|
10.128111
|
0.6592419
|
|
|
country
|
acronym
|
year
|
total_deaths
|
indoor_deaths
|
outdoor_deaths
|
ozone_deaths
|
|
Canada
|
CAN
|
1996
|
22.18101
|
0.0946226
|
20.155243
|
2.192488
|
|
Canada
|
CAN
|
1997
|
21.92768
|
0.0877542
|
19.908473
|
2.195940
|
|
Canada
|
CAN
|
1998
|
21.65538
|
0.0824492
|
19.634839
|
2.205681
|
|
Canada
|
CAN
|
1999
|
21.17703
|
0.0751278
|
19.179045
|
2.189426
|
|
Canada
|
CAN
|
2000
|
20.26486
|
0.0681836
|
18.326999
|
2.127733
|
|
Canada
|
CAN
|
2001
|
19.82451
|
0.0641108
|
17.938427
|
2.076464
|
|
Canada
|
CAN
|
2002
|
19.52428
|
0.0604824
|
17.669133
|
2.047603
|
|
Canada
|
CAN
|
2003
|
19.17033
|
0.0564743
|
17.338627
|
2.026864
|
|
Canada
|
CAN
|
2004
|
18.40919
|
0.0513588
|
16.629516
|
1.973025
|
|
Canada
|
CAN
|
2005
|
17.79268
|
0.0481667
|
16.030102
|
1.954712
|
|
Canada
|
CAN
|
2006
|
17.14391
|
0.0447622
|
15.445519
|
1.888735
|
|
Canada
|
CAN
|
2007
|
16.93196
|
0.0435468
|
15.229981
|
1.895259
|
|
Canada
|
CAN
|
2008
|
16.51814
|
0.0407468
|
14.829238
|
1.883242
|
|
Canada
|
CAN
|
2009
|
15.76760
|
0.0380831
|
14.118647
|
1.838920
|
|
Canada
|
CAN
|
2010
|
14.88338
|
0.0340653
|
13.281852
|
1.786430
|
|
Canada
|
CAN
|
2011
|
14.59934
|
0.0319160
|
13.030477
|
1.756998
|
|
Canada
|
CAN
|
2012
|
13.82968
|
0.0307105
|
12.243601
|
1.764727
|
|
Canada
|
CAN
|
2013
|
12.97501
|
0.0288027
|
11.410021
|
1.733997
|
|
Canada
|
CAN
|
2014
|
12.61872
|
0.0276959
|
11.032571
|
1.746991
|
|
Canada
|
CAN
|
2015
|
12.21793
|
0.0270578
|
10.609097
|
1.763895
|
|
Canada
|
CAN
|
2016
|
11.00267
|
0.0251286
|
9.397502
|
1.740834
|
|
Canada
|
CAN
|
2017
|
10.71662
|
0.0247705
|
9.110733
|
1.739718
|
|
Lastly, we will join the population and and deaths with its respected
country.
|
country
|
acronym
|
year
|
total_deaths
|
indoor_deaths
|
outdoor_deaths
|
ozone_deaths
|
Count
|
|
Australia
|
AUS
|
1996
|
23.04465
|
0.3585034
|
22.407071
|
0.3249375
|
18311000
|
|
Australia
|
AUS
|
1997
|
22.43025
|
0.3222224
|
21.838737
|
0.3141838
|
18517000
|
|
Australia
|
AUS
|
1998
|
21.50529
|
0.2839769
|
20.960276
|
0.3048918
|
18711000
|
|
Australia
|
AUS
|
1999
|
20.40911
|
0.2590092
|
19.897091
|
0.2953354
|
18926000
|
|
Australia
|
AUS
|
2000
|
19.39822
|
0.2398763
|
18.909240
|
0.2899216
|
19153000
|
|
Australia
|
AUS
|
2001
|
18.58572
|
0.2234341
|
18.118700
|
0.2836469
|
19413000
|
|
Australia
|
AUS
|
2002
|
18.11849
|
0.2105980
|
17.662269
|
0.2859938
|
19651400
|
|
Australia
|
AUS
|
2003
|
17.23830
|
0.1937083
|
16.802536
|
0.2816949
|
19895400
|
|
Australia
|
AUS
|
2004
|
16.34770
|
0.1760229
|
15.932077
|
0.2785466
|
20127400
|
|
Australia
|
AUS
|
2005
|
15.41337
|
0.1599279
|
15.016089
|
0.2757150
|
20394800
|
|
Australia
|
AUS
|
2006
|
14.92239
|
0.1496469
|
14.530223
|
0.2819060
|
20697900
|
|
Australia
|
AUS
|
2007
|
14.92140
|
0.1449723
|
14.514884
|
0.3042005
|
20827600
|
|
Australia
|
AUS
|
2008
|
14.64683
|
0.1383225
|
14.228709
|
0.3254648
|
21249200
|
|
Australia
|
AUS
|
2009
|
14.11563
|
0.1259313
|
13.694572
|
0.3431982
|
21691700
|
|
Australia
|
AUS
|
2010
|
13.57171
|
0.1174834
|
13.140380
|
0.3647233
|
22031750
|
|
Australia
|
AUS
|
2011
|
13.72763
|
0.1119247
|
13.276676
|
0.3956796
|
22340024
|
|
Australia
|
AUS
|
2012
|
12.65973
|
0.1018626
|
12.196401
|
0.4192914
|
22733465
|
|
Australia
|
AUS
|
2013
|
11.87449
|
0.0973836
|
11.384154
|
0.4530427
|
23128129
|
|
Australia
|
AUS
|
2014
|
11.47268
|
0.0931036
|
10.939491
|
0.5037056
|
23475686
|
|
Australia
|
AUS
|
2015
|
11.27679
|
0.0886376
|
10.702072
|
0.5544068
|
23815995
|
|
Australia
|
AUS
|
2016
|
10.58644
|
0.0844017
|
9.974549
|
0.5955779
|
24190907
|
|
Australia
|
AUS
|
2017
|
10.79595
|
0.0833628
|
10.128111
|
0.6592419
|
24601860
|
|
|
country
|
acronym
|
year
|
total_deaths
|
indoor_deaths
|
outdoor_deaths
|
ozone_deaths
|
Count
|
|
Canada
|
CAN
|
1996
|
22.18101
|
0.0946226
|
20.155243
|
2.192488
|
29610218
|
|
Canada
|
CAN
|
1997
|
21.92768
|
0.0877542
|
19.908473
|
2.195940
|
29905948
|
|
Canada
|
CAN
|
1998
|
21.65538
|
0.0824492
|
19.634839
|
2.205681
|
30155173
|
|
Canada
|
CAN
|
1999
|
21.17703
|
0.0751278
|
19.179045
|
2.189426
|
30401286
|
|
Canada
|
CAN
|
2000
|
20.26486
|
0.0681836
|
18.326999
|
2.127733
|
30685730
|
|
Canada
|
CAN
|
2001
|
19.82451
|
0.0641108
|
17.938427
|
2.076464
|
31020902
|
|
Canada
|
CAN
|
2002
|
19.52428
|
0.0604824
|
17.669133
|
2.047603
|
31360079
|
|
Canada
|
CAN
|
2003
|
19.17033
|
0.0564743
|
17.338627
|
2.026864
|
31644028
|
|
Canada
|
CAN
|
2004
|
18.40919
|
0.0513588
|
16.629516
|
1.973025
|
31940655
|
|
Canada
|
CAN
|
2005
|
17.79268
|
0.0481667
|
16.030102
|
1.954712
|
32243753
|
|
Canada
|
CAN
|
2006
|
17.14391
|
0.0447622
|
15.445519
|
1.888735
|
32571174
|
|
Canada
|
CAN
|
2007
|
16.93196
|
0.0435468
|
15.229981
|
1.895259
|
32889025
|
|
Canada
|
CAN
|
2008
|
16.51814
|
0.0407468
|
14.829238
|
1.883242
|
33247118
|
|
Canada
|
CAN
|
2009
|
15.76760
|
0.0380831
|
14.118647
|
1.838920
|
33628895
|
|
Canada
|
CAN
|
2010
|
14.88338
|
0.0340653
|
13.281852
|
1.786430
|
34004889
|
|
Canada
|
CAN
|
2011
|
14.59934
|
0.0319160
|
13.030477
|
1.756998
|
34339328
|
|
Canada
|
CAN
|
2012
|
13.82968
|
0.0307105
|
12.243601
|
1.764727
|
34714222
|
|
Canada
|
CAN
|
2013
|
12.97501
|
0.0288027
|
11.410021
|
1.733997
|
35082954
|
|
Canada
|
CAN
|
2014
|
12.61872
|
0.0276959
|
11.032571
|
1.746991
|
35437435
|
|
Canada
|
CAN
|
2015
|
12.21793
|
0.0270578
|
10.609097
|
1.763895
|
35702908
|
|
Canada
|
CAN
|
2016
|
11.00267
|
0.0251286
|
9.397502
|
1.740834
|
36109487
|
|
Canada
|
CAN
|
2017
|
10.71662
|
0.0247705
|
9.110733
|
1.739718
|
36540268
|
|
We also looked at how the data varied by continent.
joined_all <- right_join(deaths_df, world_pop, by=c('country' = 'Country.Name', 'year' = 'Year'))
head(joined_all)
## country acronym year total_deaths indoor_deaths outdoor_deaths
## 1 Afghanistan AFG 1990 299.4773 250.3629 46.44659
## 2 Afghanistan AFG 1991 291.2780 242.5751 46.03384
## 3 Afghanistan AFG 1992 278.9631 232.0439 44.24377
## 4 Afghanistan AFG 1993 278.7908 231.6481 44.44015
## 5 Afghanistan AFG 1994 287.1629 238.8372 45.59433
## 6 Afghanistan AFG 1995 288.0142 239.9066 45.36714
## ozone_deaths Count
## 1 5.616442 12412308
## 2 5.603960 13299017
## 3 5.611822 14485546
## 4 5.655266 15816603
## 5 5.718922 17075727
## 6 5.739174 18110657
north_america <- joined_all %>% filter(country %in% c("United States", "Canada"))
head(na.omit(north_america))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Canada CAN 1990 23.74844 0.1461597 21.82110 2.024766
## 2 Canada CAN 1991 23.34036 0.1347912 21.40547 2.046623
## 3 Canada CAN 1992 23.00947 0.1247982 21.06392 2.069720
## 4 Canada CAN 1993 23.03293 0.1191081 21.03444 2.135114
## 5 Canada CAN 1994 22.60288 0.1107671 20.59547 2.152504
## 6 Canada CAN 1995 22.32566 0.1015955 20.28851 2.193303
## Count
## 1 27691138
## 2 28037420
## 3 28371264
## 4 28684764
## 5 29000663
## 6 29302311
south_america <- joined_all %>% filter(country %in% c("Brazil", "Chile"))
head(na.omit(south_america))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Brazil BRA 1990 74.96820 44.08928 28.36460 3.330584
## 2 Brazil BRA 1991 71.52505 41.12989 27.91653 3.272506
## 3 Brazil BRA 1992 69.97594 39.07269 28.37737 3.321153
## 4 Brazil BRA 1993 69.34644 37.34668 29.37063 3.439490
## 5 Brazil BRA 1994 66.74580 34.60871 29.48986 3.445359
## 6 Brazil BRA 1995 63.54859 31.67095 29.22721 3.430127
## Count
## 1 149003223
## 2 151648011
## 3 154259380
## 4 156849078
## 5 159432716
## 6 162019896
africa <- joined_all %>% filter(country %in% c("Nigeria", "Malawi"))
head(na.omit(africa))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Malawi MWI 1990 167.7156 153.3657 12.60813 3.518561
## 2 Malawi MWI 1991 167.8769 153.3428 12.77371 3.541273
## 3 Malawi MWI 1992 171.1963 156.2008 13.19234 3.618770
## 4 Malawi MWI 1993 175.2565 159.9608 13.45895 3.686304
## 5 Malawi MWI 1994 180.9753 164.9773 14.10506 3.784780
## 6 Malawi MWI 1995 183.4036 166.9812 14.48956 3.847709
## Count
## 1 9404500
## 2 9600355
## 3 9685973
## 4 9710331
## 5 9745690
## 6 9844415
europe <- joined_all %>% filter(country %in% c("Germany", "Serbia"))
head(na.omit(europe))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Germany DEU 1990 41.91322 1.600590 38.11494 2.724651
## 2 Germany DEU 1991 40.73815 1.472532 37.08854 2.694316
## 3 Germany DEU 1992 38.94425 1.367432 35.45345 2.622836
## 4 Germany DEU 1993 38.25349 1.275528 34.85003 2.623219
## 5 Germany DEU 1994 36.85860 1.182584 33.58411 2.573705
## 6 Germany DEU 1995 35.66449 1.109101 32.47285 2.557293
## Count
## 1 79433029
## 2 80013896
## 3 80624598
## 4 81156363
## 5 81438348
## 6 81678051
asia <- joined_all %>% filter(country %in% c("Pakistan", "Sri Lanka"))
head(na.omit(asia))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Pakistan PAK 1990 144.7155 104.4196 34.80304 10.09603
## 2 Pakistan PAK 1991 148.0120 105.5436 36.80428 10.35961
## 3 Pakistan PAK 1992 148.6560 105.2133 37.76577 10.35540
## 4 Pakistan PAK 1993 149.6526 104.9854 38.95704 10.37194
## 5 Pakistan PAK 1994 151.1992 105.3557 40.06784 10.44016
## 6 Pakistan PAK 1995 154.9523 107.2959 41.72728 10.67907
## Count
## 1 107647921
## 2 110778648
## 3 113911126
## 4 117086685
## 5 120362762
## 6 123776839
oceania <- joined_all %>% filter(country %in% c("Australia", "New Zealand"))
head(na.omit(oceania))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Australia AUS 1990 26.70503 0.6924006 25.72983 0.3285590
## 2 Australia AUS 1991 25.91503 0.6172074 25.02097 0.3222915
## 3 Australia AUS 1992 25.70745 0.5594191 24.86599 0.3286297
## 4 Australia AUS 1993 24.63559 0.4920491 23.86602 0.3232958
## 5 Australia AUS 1994 24.38185 0.4454673 23.65269 0.3300999
## 6 Australia AUS 1995 23.10038 0.3895721 22.43122 0.3244735
## Count
## 1 17065100
## 2 17284000
## 3 17495000
## 4 17667000
## 5 17855000
## 6 18072000
This is a closer view on the population growth over time in both the
high and low populated countries that we selected.
This graph shows the population change over time. Something to note
is Germany and Australia’s line seem relatively flat, but a closer look
will determine there is a gradual increase that is drastically slower
than the other countries.

These graphs are of the same information but we have added the
percentage of air-pollution related deaths as the width of the line to
demonstrate visually if deaths increased or decreased over time. It is
easier to see in the high-populated countries but air-pollution related
deaths do decrease over time.
Death Count
Which country has the highest average death count?
Let’s make a table depicting the high and low populated countries and
their respected death count due to pollution.
|
country
|
hp_average_death
|
|
Australia
|
17.76815
|
|
Brazil
|
48.42928
|
|
Germany
|
28.10988
|
|
Nigeria
|
112.30157
|
|
Pakistan
|
144.33463
|
|
United States
|
26.35827
|
|
|
country
|
lp_average_death
|
|
Canada
|
18.18542
|
|
Chile
|
36.51321
|
|
Malawi
|
147.77167
|
|
New Zealand
|
15.92536
|
|
Serbia
|
80.66558
|
|
Sri Lanka
|
69.60383
|
|
We wanted to take a closer look at the death count and see which
country has the highest average death count. In the tables we made you
can see that Pakistan had the highest average death count at 144.33 for
the high populated countries. Malawi had the highest average death count
at 147.77 for the low populated countries, which is higher than
Pakistan.
Let’s see how this is different from continent to continent
#Mean total deaths for each continent
deaths_north <- na.omit(north_america) %>%
group_by(country) %>%
summarize(north_america_deaths = mean(total_deaths))
deaths_south <- na.omit(south_america) %>%
group_by(country) %>%
summarize(south_america_deaths = mean(total_deaths))
deaths_africa <- na.omit(africa) %>%
group_by(country) %>%
summarize(africa_deaths = mean(total_deaths))
deaths_europe <- na.omit(europe) %>%
group_by(country) %>%
summarize(europe_deaths = mean(total_deaths))
deaths_asia <- na.omit(asia) %>%
group_by(country) %>%
summarize(asia_deaths = mean(total_deaths))
deaths_oceania <- na.omit(oceania) %>%
group_by(country) %>%
summarize(oceania_deaths = mean(total_deaths))
#Table to view continent deaths
kable(deaths_north, caption = "North America Average Death Count")
North America Average Death Count
|
country
|
north_america_deaths
|
|
Canada
|
18.18542
|
|
United States
|
26.35827
|
kable(deaths_south, caption = "South America Average Death Count")
South America Average Death Count
|
country
|
south_america_deaths
|
|
Brazil
|
48.42928
|
|
Chile
|
36.51321
|
kable(deaths_africa, caption = "Africa Average Death Count")
Africa Average Death Count
|
country
|
africa_deaths
|
|
Malawi
|
147.7717
|
|
Nigeria
|
112.3016
|
kable(deaths_asia, caption = "Asia Average Death Count")
Asia Average Death Count
|
country
|
asia_deaths
|
|
Pakistan
|
144.33463
|
|
Sri Lanka
|
69.60383
|
kable(deaths_europe, caption = "Europe Average Death Count")
Europe Average Death Count
|
country
|
europe_deaths
|
|
Germany
|
28.10988
|
|
Serbia
|
80.66558
|
kable(deaths_oceania, caption = "Oceania Average Death Count")
Oceania Average Death Count
|
country
|
oceania_deaths
|
|
Australia
|
17.76815
|
|
New Zealand
|
15.92536
|
When we look at the average death count based on continent we can see
that overall Oceania countries had the least amount of deaths. On
average Australia had an average death count of roughly 17.8 and New
Zealand had an average death count of 15.9. Whereas Africa countries had
the most amount of deaths. On average Malawi had an average of 147.8 and
Nigeria had an average of 112.3.
Here’s a graph to clearly visualize the previous table
To get a better visualization we created a bar graph of the average
deaths in both the high and low populated countries. In this
high-population graph you can see that Pakistan is at the highest and
Australia is at the lowest. In the low-population graph you can see that
Malawi is at the highest and New Zealand is at the lowest.
So we’ve looked at the deaths due to pollution, but what percentage
of the population was affected?
In order to get rid of the leading zeros, and clean up the y-axis, we
multiplied the ‘percent_high’ and ‘percent_low’ by 100,000 since the
data was per 100,000 when calculating deaths.
|
Country.Name
|
average_population
|
|
Australia
|
21085646
|
|
Brazil
|
188017856
|
|
Germany
|
81914553
|
|
Nigeria
|
146828087
|
|
Pakistan
|
166653684
|
|
United States
|
299036073
|
|
|
Country.Name
|
average_population
|
|
Canada
|
32874340
|
|
Chile
|
16466330
|
|
Malawi
|
13442531
|
|
New Zealand
|
4193041
|
|
Serbia
|
7358242
|
|
Sri Lanka
|
19758408
|
|
So now that we’ve looked at the deaths due to pollution we wanted to
see what percentage of the population was actually affected by this. At
the top we have a table depicting the average populations in both the
high and low populated countries. You can see for the high populated
countries that Pakistan is in the lead with 1.21% and for the low
populated countries Malawi is in the lead with 13.1%.
Pollution Types
Which type of pollution has the greatest number of deaths?
|
country
|
avg_indoor
|
avg_outdoor
|
avg_ozone
|
|
Pakistan
|
87.7427944
|
50.52063
|
10.440656
|
|
Nigeria
|
75.8755074
|
35.21678
|
2.117076
|
|
Brazil
|
19.4258385
|
26.84194
|
2.740342
|
|
Germany
|
0.7170881
|
25.47078
|
2.343892
|
|
Australia
|
0.2485867
|
17.20789
|
0.360452
|
|
United States
|
0.1656402
|
22.79947
|
3.915093
|
|
country
|
avg_indoor
|
avg_outdoor
|
avg_ozone
|
|
Canada
|
0.0651156
|
16.38423
|
1.9697041
|
|
Chile
|
8.6932699
|
27.17442
|
0.8504919
|
|
Malawi
|
132.1891749
|
13.81151
|
3.3870514
|
|
New Zealand
|
0.2908622
|
15.56872
|
0.0727512
|
|
Serbia
|
35.8762796
|
42.71254
|
2.9395671
|
|
Sri Lanka
|
44.5428441
|
24.77233
|
0.4304406
|
Pollution Over Time
Let’s look at the previous two decades and compare the death
count
has there been a change?
This is the first decade 1996-2006
|
country
|
High_Deaths_96
|
High_Deaths_01
|
High_Deaths_06
|
|
Australia
|
23.04465
|
18.58572
|
14.92239
|
|
Brazil
|
60.67757
|
49.46436
|
41.46829
|
|
Germany
|
34.72325
|
28.38756
|
23.83654
|
|
Nigeria
|
136.08978
|
123.05129
|
102.26653
|
|
Pakistan
|
155.42988
|
151.25352
|
146.09296
|
|
United States
|
29.99271
|
28.93114
|
25.93369
|
|
|
country
|
Low_Deaths_96
|
Low_Deaths_01
|
Low_Deaths_06
|
|
Canada
|
22.18101
|
19.82451
|
17.14391
|
|
Chile
|
46.36829
|
37.43188
|
30.99058
|
|
Malawi
|
183.14179
|
165.41702
|
137.54033
|
|
Serbia
|
93.44700
|
83.18333
|
79.04236
|
|
Sri Lanka
|
85.28997
|
72.16239
|
66.04455
|
|
Tonga
|
100.66078
|
95.27073
|
88.65608
|
|
This is the second decade 2007-2017
|
country
|
High_Deaths_07
|
High_Deaths_12
|
High_Deaths_17
|
|
Australia
|
14.92140
|
12.65973
|
10.79595
|
|
Brazil
|
40.42460
|
35.39069
|
30.32108
|
|
Germany
|
23.45850
|
20.91536
|
19.82826
|
|
Nigeria
|
98.90306
|
84.22324
|
81.22147
|
|
Pakistan
|
143.81724
|
133.93887
|
123.21548
|
|
United States
|
25.11756
|
21.98194
|
18.82515
|
|
|
country
|
Low_Deaths_07
|
Low_Deaths_12
|
Low_Deaths_17
|
|
Canada
|
16.93196
|
13.82968
|
10.71662
|
|
Chile
|
30.53130
|
27.31475
|
24.29921
|
|
Malawi
|
132.12253
|
116.27470
|
104.93508
|
|
Serbia
|
76.65752
|
72.77354
|
62.57853
|
|
Sri Lanka
|
66.05987
|
59.22433
|
38.46264
|
|
Tonga
|
87.81178
|
79.49336
|
70.72940
|
|
Let’s see if there is variation by continent. Here are some tables
for the first decade (1996-2006) and second decade (2007-2017) grouped
by continent.
#North America 1996-2006
north_96 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
north_01 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
north_06 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(north_96,north_01,north_06), caption = "North America Deaths 1996-2006")
North America Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Canada
|
22.18101
|
|
United States
|
29.99271
|
|
|
country
|
avg_deaths_01
|
|
Canada
|
19.82451
|
|
United States
|
28.93114
|
|
|
country
|
avg_deaths_06
|
|
Canada
|
17.14391
|
|
United States
|
25.93369
|
|
# North America 2007-2017
north_07 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
north_12 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
north_17 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(north_07,north_12,north_17), caption = "North America Deaths 2007-2017")
North America Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Canada
|
16.93196
|
|
United States
|
25.11756
|
|
|
country
|
avg_deaths_12
|
|
Canada
|
13.82968
|
|
United States
|
21.98194
|
|
|
country
|
avg_deaths_17
|
|
Canada
|
10.71662
|
|
United States
|
18.82515
|
|
#South America 1996-2006
south_96 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
south_01 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
south_06 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(south_96,south_01,south_06), caption = "South America Deaths 1996-2006")
South America Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Brazil
|
60.67757
|
|
Chile
|
46.36829
|
|
|
country
|
avg_deaths_01
|
|
Brazil
|
49.46436
|
|
Chile
|
37.43188
|
|
|
country
|
avg_deaths_06
|
|
Brazil
|
41.46829
|
|
Chile
|
30.99058
|
|
# South America 2007-2017
south_07 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
south_12 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
south_17 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(south_07,south_12,south_17), caption = "South America Deaths 2007-2017")
South America Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Brazil
|
40.4246
|
|
Chile
|
30.5313
|
|
|
country
|
avg_deaths_12
|
|
Brazil
|
35.39069
|
|
Chile
|
27.31475
|
|
|
country
|
avg_deaths_17
|
|
Brazil
|
30.32108
|
|
Chile
|
24.29921
|
|
# Africa 1996-2006
africa_96 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
africa_01 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
africa_06 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(africa_96,africa_01,africa_06), caption = "Africa Deaths 1996-2006")
Africa Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Malawi
|
183.1418
|
|
Nigeria
|
136.0898
|
|
|
country
|
avg_deaths_01
|
|
Malawi
|
165.4170
|
|
Nigeria
|
123.0513
|
|
|
country
|
avg_deaths_06
|
|
Malawi
|
137.5403
|
|
Nigeria
|
102.2665
|
|
# Africa 2007-2017
africa_07 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
africa_12 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
africa_17 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(africa_07,africa_12,africa_17), caption = "Africa Deaths 2007-2017")
Africa Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Malawi
|
132.12253
|
|
Nigeria
|
98.90306
|
|
|
country
|
avg_deaths_12
|
|
Malawi
|
116.27470
|
|
Nigeria
|
84.22324
|
|
|
country
|
avg_deaths_17
|
|
Malawi
|
104.93508
|
|
Nigeria
|
81.22147
|
|
#Europe 1996-2006
europe_96 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
europe_01 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
europe_06 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(europe_96,europe_01,europe_06), caption = "Europe Deaths 1996-2006")
Europe Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Germany
|
34.72325
|
|
Serbia
|
93.44700
|
|
|
country
|
avg_deaths_01
|
|
Germany
|
28.38756
|
|
Serbia
|
83.18333
|
|
|
country
|
avg_deaths_06
|
|
Germany
|
23.83654
|
|
Serbia
|
79.04236
|
|
#Europe 2007-2017
europe_07 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
europe_12 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
europe_17 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(europe_07,europe_12,europe_17), caption = "Europe Deaths 2007-2017")
Europe Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Germany
|
23.45850
|
|
Serbia
|
76.65752
|
|
|
country
|
avg_deaths_12
|
|
Germany
|
20.91536
|
|
Serbia
|
72.77354
|
|
|
country
|
avg_deaths_17
|
|
Germany
|
19.82826
|
|
Serbia
|
62.57853
|
|
#Asia 1996-2006
asia_96 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
asia_01 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
asia_06 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(asia_96,asia_01,asia_06), caption = "Asia Deaths 1996-2006")
Asia Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Pakistan
|
155.42988
|
|
Sri Lanka
|
85.28997
|
|
|
country
|
avg_deaths_01
|
|
Pakistan
|
151.25352
|
|
Sri Lanka
|
72.16239
|
|
|
country
|
avg_deaths_06
|
|
Pakistan
|
146.09296
|
|
Sri Lanka
|
66.04455
|
|
#Asia 2007-2017
asia_07 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
asia_12 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
asia_17 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(asia_07,asia_12,asia_17), caption = "Asia Deaths 2007-2017")
Asia Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Pakistan
|
143.81724
|
|
Sri Lanka
|
66.05987
|
|
|
country
|
avg_deaths_12
|
|
Pakistan
|
133.93887
|
|
Sri Lanka
|
59.22433
|
|
|
country
|
avg_deaths_17
|
|
Pakistan
|
123.21548
|
|
Sri Lanka
|
38.46264
|
|
#Oceania 1996-2006
oceania_96 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
oceania_01 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
oceania_06 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(oceania_96,oceania_01,oceania_06), caption = "Oceania Deaths 1996-2006")
Oceania Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Australia
|
23.04465
|
|
New Zealand
|
21.15988
|
|
|
country
|
avg_deaths_01
|
|
Australia
|
18.58572
|
|
New Zealand
|
16.91014
|
|
|
country
|
avg_deaths_06
|
|
Australia
|
14.92239
|
|
New Zealand
|
13.76706
|
|
#Oceania 2007-2017
oceania_07 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
oceania_12 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
oceania_17 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(oceania_07,oceania_12,oceania_17), caption = "Oceania Deaths 2007-2017")
Oceania Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Australia
|
14.92140
|
|
New Zealand
|
13.58658
|
|
|
country
|
avg_deaths_12
|
|
Australia
|
12.65973
|
|
New Zealand
|
10.91224
|
|
|
country
|
avg_deaths_17
|
|
Australia
|
10.795952
|
|
New Zealand
|
8.598757
|
|
Let’s graph the previous tables!
The first decade 1996-2006.
This shows the second decade 2007-2017.
By comparing each pollutant type, we can determine which year and
country had the highest numbers of deaths
Indoor Deaths
Outdoor Deaths
Ozone Deaths
Which is worse?
outdoor or indoor pollution?
Let’s reintroduce a graph we looked at earlier. Instead this time we
will combine the pollutant types together.
We cannot conclude which is worse.
- High Populated Countries:
- Outdoor pollution seems to be more detrimental with the exception of
two countries in this sample set.
- Low Populated Countries:
Summary
- Which country has the highest average death count?
- High Population: Pakistan
- Low Population: Malawi
- Has the percentage of the affected population decreased or increased
over time?
- Generally it is decreasing for both High and Low populated
countries
- Which pollutant type has the greatest number of deaths?
- High Population: Indoor Pollution
- Low Population: Indoor Pollution
- How has the death count changed over the past two decades?
- 1996-2006:
- High Population: Decreases
- Low Population: Decreases
- 2007-2017:
- High Population: Decreases
- Low Population: Decreases
- Which year and country had the highest number of deaths per
pollutant type?
- We looked at years 1996-2017
- Indoor: Pakistan and Malawi were mainly affected in 1996
- Outdoor: Serbia and Pakistan were the top countries.
- 2011 was the worst for Pakistan.
- 1997 was the worst for Serbia
- Sri Lanka and Tonga increased, but Sri Lanka had a steep decrease
after 2015
- Outdoor Ozone: Pakistan and Malawi were the top countries.
- 1997 was the worst for Pakistan.
- 1998 was the worst for Malawi
- United States was the second highest amount of deaths among the
higher populated countries
- Pakistan decreased and then slightly increased
- Which is worse - outdoor or indoor pollution?